WHU-Mix (vector) building dataset

The WHU-Mix dataset consists of 64k image tiles with over 754k buildings, and covers an area of about 1100 km2. Nevertheless, the WHU-Mix dataset is mainly intended for segmentation-based methods; however, in this work, the focus is on direct vector format building polygon extraction. To adapt to this, we pulled the corresponding raw data and after editing, named it the WHU-Mix (vector) dataset. The WHU-Mix (vector) dataset uses the MS-COCO format for the building labels.

Table 1 lists the composition details of the WHU-Mix (vector) dataset. The WHU-Mix (vector) dataset contains data from WHU [2], Crowd AI [3], Open AI [4], SpaceNet [5], and Inria [6] datasets, as well as some other new data we collect. The annotation formats (GeoJson, Json) provided by Crowd AI, Open AI, and SpaceNet can be losslessly converted to MS-COCO format. There are many offsets and missing annotations in the SpaceNet and Inria datasets. We manually corrected these data, and the annotation accuracy has been significantly improved. Data from other regions (Data source: New) are annotated in shapefile format and can also be directly converted to vector annotation. The entire construction process of the WHU-Mix (vector) dataset took about six months.

To validate the performance of the proposed method from multiple perspectives, we divided the entire dataset into a training set, a validation set, and two test sets (test sets Ⅰ and Ⅱ). Firstly, we mixed the Nos. 1–8 tiles and divided them into a training set (43,778 tiles), a validation set (2,922 tiles), and test set I (11,675 tiles), according to the ratio of 15:1:4. The Nos. 9–12 tiles were then mixed as test set II (6,011 tiles). The more challenging test set II does not have any geographical overlap with the training set, and can be used to further validate the generalization ability of a method on a more realistic real-world case.

Note: ‘*’ denotes the we edited the incorrect labels in these datasets.

[1] T. Lin et al., "Microsoft COCO: Common Objects in Context," vol. 8693, pp. 740-755, 2014.

[2] S. Ji, S. Wei, and M. Lu, "Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set," IEEE Transactions on Geoscience and Remote Sensing, no. 99, pp. 1-13, 2018, doi: 10.1109/TGRS.2018.2858817.

[3] S. P. Mohanty, "CrowdAI dataset (2018)," 2018. [Online]. Available: https://www.crowdai.org/challenges/mapping-challenge/dataset_files.

[4] OpenAI, "2018 Open AI Tanzania Building Footprint Segmentation Challenge," 2018. [Online]. Available: https://competitions.codalab.org/competitions/20100.

[5] A. Van Etten, D. Lindenbaum, and T. M. Bacastow, "Spacenet: A remote sensing dataset and challenge series," arXiv preprint arXiv:1807.01232, 2018.

[6] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, "Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark," in IGARSS 2017 - 2017 IEEE International Geoscience and Remote Sensing Symposium, 2017, pp. 3226-3229.